In this report, we aim to estimate the association between the genomic offset (GO) predictions and mortality rates in natural populations of maritime pine, used as a proxy of the absolute fitness of the populations. A positive association between GO predictions and the mortality rates would suggest that GO approaches capture maladaptation in the face of demographic complexity (i.e., the effects of other processes on spatial variance in allele frequencies, such as expansion history and gene flow) as well as the genetic architecture of climate adaptation (polygenic trait architectures, G×E interactions, nonadditive genetic variance).
For that, we use mortality data from the French and Spanish National Forest Inventories (NFI) harmonized in Changenet et al. (2021). The French data relies on temporary plots sampled between 2005 and 2014 while the Spanish data relies on permanent plots sampled during the second (from 1986 to 1996) and third NFIs (from 1997 to 2008). A tree was recorded as dead if its death was dated at less than 5 years ago in the French NFI or if it was alive in the second inventory but dead in the third one in the Spanish NFI.
The different census intervals among the plots as accounting for by modelling the proportion \(p_{i}\) of maritime pines that died in the plot \(i\) during the census interval \(\Delta_i\) with the complementary log-log link and an offset on the logarithm of the census interval \(\Delta_{i}\) for the plot \(i\). The modeling approach was evaluated with simulations in the report 14_ValidationNFI_ModelComparison.qmd.
In the present report, we evaluate the relationship between mortality rates in the NFI plots and the G0 predictions with the two following models.
Model with competition as covariate
This model corresponds to the model M3 in the report 14_ValidationNFI_ModelComparison.html.
\(N_{i}\) is the total number of maritime pines in the plot \(i\),
\(m_{i}\) is the number of maritime pines that died during the census interval \(\Delta_{i}\) in the plot \(i\),
\(\beta_{0,c}\) are country-specific intercepts that account for the methodological differences between the French and Spanish inventories that may bias the estimations,
\(C_{i}\) is the basal area of all tree species confounded in the plot \(i\) (to account for the competition among trees), with its associated coefficient \(\beta_{C}\),
\(GO_{i}\) is the genomic offset predicted in the plot \(i\), with its associated coefficient \(\beta_{GO}\).
Model with competition and tree age as covariates
This model corresponds to the model M6_2 in the report 14_ValidationNFI_ModelComparison.html.
\(DBH_{i}\) the mean diameter at breast height (DBH) of the maritime pines in the plot \(i\) (including adults, dead trees and recruitment trees) and its associated coefficient \(\beta_{DBH}\). This variable is expected to account for the different mean tree ages across plots that may impact mortality rates.
We present the results from this model in the main manuscript.
NFI data was harmonized in Changenet et al. (2021).
Code
nfi_data <-readRDS(here::here("data/NFIdata/NFIrawdata_Changenet2021.rds")) %>%droplevels() %>%as_tibble() %>% dplyr::select(plotcode, longitude, latitude, country, yearsbetweensurveys, # number of years between surveys surveydate1, # date first survey Spanish NFI surveydate, # year of the second survey in the Spanish NFI and the unique survey in the French NFI treeNbrJ.M, # number of dead trees in the plot treeNbrJ.IMall, # total number of trees in the plot BA.ha.plot.1, # basal area of all tree species in the plot dbhJ.IMall.plot.mean # mean DBH of the maritime pines in the plot including dead trees but also saplings (recruitment) ) %>% dplyr::rename(nb_years = yearsbetweensurveys,nb_dead = treeNbrJ.M,nb_tot = treeNbrJ.IMall,basal_area = BA.ha.plot.1,mean_DBH = dbhJ.IMall.plot.mean,first_survey = surveydate1,second_survey = surveydate) %>% dplyr::mutate(first_survey =case_when(!is.na(first_survey) ~str_sub(first_survey,1,4) %>%as.numeric()), # keep only the year of the 1st surveyprop_dead = nb_dead/nb_tot, # proportion of dead treesannual_prop_dead = (nb_dead/nb_tot)/nb_years) # annual proportion of dead trees
Variables in the dataset
Plot code: plotcode
Longitude and latitude of the plot: longitude and latitude.
Country in which the plot is: country (ES = Spain, FR = France).
The number of years between surveys in the Spanish inventory (which is equal to 5 in the French inventory as mortality is estimated in the five years before the survey date): nb_years.
The dates of the first and second survey in the Spanish inventory: surveydate1 and surveydate2.
The year of the second survey in the Spanish inventory and of the unique survey in the French inventory: surveydate.
The number of dead trees in the plot: nb_dead.
The total number of trees in the plot: nb_tot.
The basal area of all tree species in the plot (proxy of the competition among trees): basal_area.
The mean DBH of the maritime pines in the plot including dead trees but also saplings (recruitment): mean_DBH.
12610 plots in this dataset: 4097 from the French NFI and 8513 from the Spanish inventory.
No maritime pine was recorded in these eight plots (nb_dead = 0 and nb_tot = 0). We check that this is the case for the 664 plots by looking at the unique values in the columns nb_dead and nb_tot in the subset of plots with missing data for the proportion of dead trees (the df_na_prop_dead dataset).
nfi_data %>%ggplot(aes(x=nb_years, fill=country)) +geom_histogram(alpha=0.5, position="identity") +labs(y="Count of plots",x="Number of years between surveys") +theme_bw()
We are going to extract the climatic data between 10 years before the first survey until the date of the second survey. For the French NFI, as there is only one survey, we will extract climatic data for the 15 years before the survey.
So, we need annual climatic data at the location of the NFI plots for the period 1972-2014.
Calculate climate for each plot
Once we have the annual climatic data at the location of the NFI plots, we generate two datasets:
one with the past climate data of the NFI plots, across the period 1901-1950.
one with the climatic data across a period specific to each plot, corresponding to:
The period from 5 years before the first inventory date to the second inventory date for the Spanish NFI.
The period from 10 years before the inventory date and the inventory date for the French NFI. A tree is considered dead in the French NFI if its death is estimated as less than 5 years before the inventory date.
We included the 5 years preceding the estimated date of tree death, as the climate of the previous years may have lag effects on tree death.
We load the annual climatic data, which corresponds to point estimate at the location of the NFI plots:
We calculate the average climatic conditions across the period 1901-1950:
Code
# Function to calculate the average of the climatic variables at the location of the NFI plotssource(here("scripts/functions/calc_avg_clim_var.R"))clim_ref <- clim %>%calc_avg_clim_var(ref_period =c(1901,1950), id_spatial_points ="plotcode")
Code
# !!Coding warning!! ####################### The coordinates from ClimateDT are not exactly the same as the original ones. # Therefore, merging the climatic data with the original data has to be done with the `plotcode` column # and not the latitude and longitude of the populations.# Comparing original coordinates with the ones from ClimateDTcomp_coord <- clim_ref %>% dplyr::select(plotcode,latitude,longitude) %>% dplyr::rename(latitude_climDT=latitude, longitude_climDT=longitude)comp_coord <- nfi_data %>% dplyr::select(plotcode,latitude,longitude) %>%inner_join(comp_coord, by="plotcode") %>%mutate(diff_latitude=latitude_climDT - latitude,diff_longitude=longitude_climDT - longitude) %>% dplyr::filter(diff_latitude !=0| diff_longitude !=0)# There are very very small differencesrange(comp_coord$diff_latitude)range(comp_coord$diff_longitude)
Code
# checking time periods for Maurizio# ==================================time_periods <- clim_survey %>% dplyr::select(min_year_clim,second_survey) %>%group_by(min_year_clim,second_survey) %>%summarise(count=n()) time_periods %>%print(n=nrow(.)) %>%kable_mydf()time_periods %>%filter(count <10)time_periods %>%write_csv(here("data/ClimaticData/NFIplots/NFIplots_TimePeriods.csv"))
We then calculate the average climatic conditions across the survey periods specific to each NFI plot:
Code
# We calculate the mean climatic values between:# - five years before the first inventory date# - the second inventory dateclim_survey <- nfi_data %>%mutate(min_year_clim =case_when(country=="FR"~ second_survey-10, country=="ES"~ first_survey-5))# we calculate the average climatic conditions for each specific periodclim_survey <- clim_survey %>%group_by(min_year_clim,second_survey) %>%group_split() %>% purrr::map(\(x){min_year <-unique(x$min_year_clim)max_year <-unique(x$second_survey)clim %>%right_join(x[,"plotcode"],by=c("plotcode")) %>%calc_avg_clim_var(ref_period =c(min_year,max_year), id_spatial_points ="plotcode")}) %>%list_rbind() # we save in a list the average climatic conditions across the two periods (reference period and period of the NFI surveys)list(clim_ref = clim_ref,clim_survey = clim_survey) %>%saveRDS(here("data/ClimaticData/NFIplots/NFIclimate.rds"))
We save the averaged climatic data at the locations of the NFI plots in the DRYAD repository:
dataset ClimateDT_NFIPlots_PastClimates.csv for the averaged climates over the reference period 1901-1950.
dataset ClimateDT_NFIPlots_SurveyClimates.csv" for the averaged climates over the survey periods specific to each plot.
Relationship btw GO predictions and mortality rates
Code
# Colors for the different sets of GO method (GDM, GF, LFMM and RDA) / SNP set (control and candidate SNPs, and all SNPs for LFMM)colors_coeff <-c("#004586FF","#FF7F0FFF","#FFD320FF","#579D1CFF","#7E0021FF","#83CAFFFF","#87D180FF","#FEB5A2FF","#F02720FF")
Model with competition as covariate
Stan code
Stan code of the model (model M3 in the report 14_ValidationNFI_ModelComparison.html):
S4 class stanmodel 'anon_model' coded as follows:
data {
int N;
vector[N] log_nb_years; // Offset to account for different census intervals
vector[N] C; // Proxy of the competition among trees (i.e. basal area)
vector[N] GO; // Genomic offset
int nb_dead[N]; // Number of dead trees in the plot
int nb_tot[N]; // Total number of trees in the plot
int<lower=0> nb_country; // Number of countries
int<lower=0, upper=nb_country> country[N]; // Countries
}
parameters {
vector[nb_country] alpha_country;
real beta_GO;
real beta_C;
}
model {
nb_dead ~ binomial(nb_tot,inv_cloglog(alpha_country[country] + beta_GO * GO + beta_C * C + log_nb_years));
alpha_country ~ normal(0, 1);
beta_GO ~ normal(0, 1);
beta_C ~ normal(0, 1);
}
Running the model
Running the model for each method (GDM, LFMM, GF and RDA) and each SNP set (all SNPs - only for LFMM-, candidate SNPs and control SNPs):
Code
# Parameters to estimateparams_to_estimate <-c("alpha_country","beta_GO","beta_C")# Running one model for each of the nine sets of method / SNP setcoefftab <-lapply(unique(df$method), function(method_i){# Subset the data - keeping only one set of method / SNP set sub_data <- df %>%filter(method == method_i) %>%inner_join(nfi_data, by="plotcode")# Data in a list for Stan stanlist <-list(N =nrow(sub_data),nb_dead = sub_data$nb_dead,nb_tot=sub_data$nb_tot,GO = (sub_data$GO -mean(sub_data$GO))/sd(sub_data$GO),C = (sub_data$basal_area -mean(sub_data$basal_area))/sd(sub_data$basal_area),nb_country=length(unique(sub_data$country)),country=as.numeric(as.factor(sub_data$country)),log_nb_years=log(sub_data$nb_years))# Running the Stan modelmod <-sampling(stancode, data = stanlist, iter =2000, chains =4, cores =4,init=0, save_warmup =FALSE)# Model coefficientsconf95 <- broom.mixed::tidyMCMC(mod,pars=params_to_estimate,droppars =NULL, robust =FALSE, # give mean and standard deviationess = T, # effective sample size estimatesrhat = T, # Rhat estimatesconf.int = T, conf.level =0.95# include 95% credible intervals ) %>% dplyr::rename(conf95_low=conf.low,conf95_high=conf.high,mean=estimate,std_deviation=std.error) broom.mixed::tidyMCMC(mod,pars=params_to_estimate,droppars =NULL, robust =TRUE, # give median and mean absolute deviation (= avg distance btw each data point and the mean)ess = F, rhat = F, conf.int = T, conf.level =0.80# include 80% credible intervals ) %>% dplyr::rename(conf80_low=conf.low,conf80_high=conf.high,median=estimate,mean_absolute_deviation=std.error) %>%inner_join(conf95,by=c("term")) %>%mutate(method=method_i,method_type=unique(sub_data$method_type),method_input=unique(sub_data$method_input),.before=1) }) %>%bind_rows()coefftab %>%saveRDS(file=here("outputs/ValidationNFI/ModelM3_coefftab.rds"))
Model coefficients
Below, we plot the 95% credible intervals of:
the \(\beta_{GO}\) coefficients, which stand for the association between mortality rates in NFI plots and the genomic offset predictions (i.e. capturing the potential maladaptation of the populations at the location of the NFI plots).
the \(\beta_C\) coefficients, which stands for the association between the tree basal area (all species confounded) in the NFI plots and the mortality rates.
As expected, the basal area is positively associated with mortality rates.
For the four methods (GDM, LFMM, GF and RDA), populations with higher genomic offset based on the candidate SNPs (and all SNPs for LFMM) had higher mortality rates in the NFI plots. Interestingly, this was not the case with genomic offset predictions based on the control SNPs: for GF and LFMM, populations with higher genomic offset based on the control SNPs also showed higher mortality rates but for GDM and RDA, populations with higher genomic offset showed lower mortality rates in the NFI plots. This confirms the importance of selecting a set of alleles potentially under selection before using the genomic offset predictions to predict maladaptation in natural populations (or using all SNPs with the LFMM approach).
We can also note the very high association between mortality rates and the genomic offset predictions based on candidate SNPs and from the GF and GDM methods.
\(\beta_{GO}\) estimates against a null distribution
We evaluated the probability that the estimated relationship between the mortality rates and GO predictions could be obtained by chance. For that, we ran again the model on the NFI data but we replaced GO predictions by a randomly generated variable \(X\) such as \(X \sim \mathcal{N}(0,1)\). We repeated this step 100 times and obtained a distribution of \(\beta_X\) coefficients capturing the relationship between mortality rates and a randomly generated variable. We then compared this distribution of \(\beta_X\) coefficients to the estimated \(\beta_{GO}\) coefficients capturing the relationship between the mortality rates and the GO predictions.
Code
# Load the coefficient values of models with predicted GOcoefftab <-readRDS(file=here("outputs/ValidationNFI/ModelM3_coefftab.rds")) %>%mutate(run_ID=method,GO="predicted") %>% dplyr::filter(term =="beta_GO") %>% dplyr::select(run_ID, GO, mean, contains("conf"))# Load the coefficient values of models with random GO and merge with the predicted GO coefficient tablecoefftab <-readRDS(file=here(paste0("outputs/ValidationNFI/Simulations/m3_truedata_randomGO_100simulations.rds"))) %>%bind_rows(.id="sim_ID") %>% dplyr::filter(term =="beta_GO") %>%mutate(run_ID =paste0("sim ",sim_ID),#run_ID = paste0("Simulation ",sim_ID),GO="random") %>% dplyr::select(run_ID, GO, mean, contains("conf")) %>%bind_rows(coefftab)# Plot the mean and 95% credible intervals with interval plotsp <- coefftab %>%ggplot(aes(mean, reorder(run_ID, mean))) +geom_vline(xintercept =0, color="gray30") +geom_errorbar(aes(xmin = conf95_low, xmax = conf95_high, color=GO)) +scale_color_manual(values=c("magenta","gray"),name ="Genomic offset") +geom_point() +theme_bw() +labs(x="GO effect size", y="") +theme(axis.text.y =element_text(size=6))# save the plot in pdfggsave(p,file=here(paste0("figs/ValidationNFI/ModelM3_IntervalPlots_RandomGO_vs_PredictedGO.pdf")),device="pdf",height=10,width=8)p
This graph shows that it is very unlikely that the relationship between the mortality rates and GO predictions of GDM + candidate SNPs, GF + candidate SNPs, RDA + candidate SNPs and GDM + control SNPs are obtained by chance.
Model with competition and tree age as covariates
Stan code
Stan code of the model (model M6_2 in the report 14_ValidationNFI_ModelComparison.html):
S4 class stanmodel 'anon_model' coded as follows:
data {
int N;
vector[N] log_nb_years; // Offset to account for different census intervals
vector[N] C; // Proxy of the competition among trees (i.e. basal area)
vector[N] DBH; // Proxy of the average tree age (i.e. mean DBH)
vector[N] GO; // Genomic offset
int nb_dead[N]; // Number of dead trees in the plot
int nb_tot[N]; // Total number of trees in the plot
int<lower=0> nb_country; // Number of countries
int<lower=0, upper=nb_country> country[N]; // Countries
}
parameters {
vector[nb_country] alpha_country;
real beta_GO;
real beta_C;
real beta_DBH;
real beta_C_DBH;
}
model {
nb_dead ~ binomial(nb_tot,inv_cloglog(alpha_country[country] + beta_GO * GO + beta_C * C + beta_DBH * DBH + beta_C_DBH * C .* DBH + log_nb_years));
alpha_country ~ normal(0, 1);
beta_GO ~ normal(0, 1);
beta_C ~ normal(0, 1);
beta_DBH ~ normal(0, 1);
beta_C_DBH ~ normal(0, 1);
}
Running the model
Running the model for each method (GDM, LFMM, GF and RDA) and each input (all SNPs - only for LFMM-, candidate SNPs and control SNPs):
Code
# Parameters to estimateparams_to_estimate <-c("alpha_country","beta_GO","beta_C", "beta_DBH","beta_C_DBH")# Running one model for each of the nine sets of method / SNP setcoefftab <-lapply(unique(df$method), function(method_i){# Subset the data - keeping only one set of method / SNP set sub_data <- df %>%filter(method == method_i) %>%inner_join(nfi_data, by="plotcode")# Data in a list for Stan stanlist <-list(N =nrow(sub_data),nb_dead = sub_data$nb_dead,nb_tot=sub_data$nb_tot,GO = (sub_data$GO -mean(sub_data$GO))/sd(sub_data$GO),C = (sub_data$basal_area -mean(sub_data$basal_area))/sd(sub_data$basal_area),DBH = (sub_data$mean_DBH -mean(sub_data$mean_DBH))/sd(sub_data$mean_DBH),nb_country=length(unique(sub_data$country)),country=as.numeric(as.factor(sub_data$country)),log_nb_years=log(sub_data$nb_years))# Running the Stan modelmod <-sampling(stancode, data = stanlist, iter =2000, chains =4, cores =4,init=0, save_warmup =FALSE)# Save the model and the stanlistlist(mod = mod, stanlist = stanlist) %>%saveRDS(file=here(paste0("outputs/ValidationNFI/StanModels/M62_",unique(sub_data$method_type),"_",unique(sub_data$method_input),".rds")))# Model coefficientsconf95 <- broom.mixed::tidyMCMC(mod,pars=params_to_estimate,droppars =NULL, robust =FALSE, # give mean and standard deviationess = T, # effective sample size estimatesrhat = T, # Rhat estimatesconf.int = T, conf.level =0.95# include 95% credible intervals ) %>% dplyr::rename(conf95_low=conf.low,conf95_high=conf.high,mean=estimate,std_deviation=std.error) broom.mixed::tidyMCMC(mod,pars=params_to_estimate,droppars =NULL, robust =TRUE, # give median and mean absolute deviation (= avg distance btw each data point and the mean)ess = F, rhat = F, conf.int = T, conf.level =0.80# include 80% credible intervals ) %>% dplyr::rename(conf80_low=conf.low,conf80_high=conf.high,median=estimate,mean_absolute_deviation=std.error) %>%inner_join(conf95,by=c("term")) %>%mutate(method=method_i,method_type=unique(sub_data$method_type),method_input=unique(sub_data$method_input),.before=1) }) %>%bind_rows()coefftab %>%saveRDS(file=here("outputs/ValidationNFI/ModelM62_coefftab.rds"))
Predictions of tree mortality probability
Plots
Let’s visualize the uncertainty around the estimation of the probability of tree mortality \(p\).
Code
############################################################################################# Visualizing the relationships between GO predictions and mortality probability predictions############################################################################################df <- df %>%mutate(method_id=paste0(method_type,"_",method_input))go_methods <-list(LFMM_all ="GO predictions from LFMM and all SNPs",LFMM_cand ="GO predictions from LFMM and candidate SNPs",LFMM_control ="GO predictions from LFMM and control SNPs",RDA_cand ="GO predictions from RDA and candidate SNPs",RDA_control ="GO predictions from RDA and control SNPs",GDM_cand ="GO predictions from GDM and candidate SNPs",GDM_control ="GO predictions from GDM and control SNPs",GF_cand ="GO predictions from GF and candidate SNPs",GF_control ="GO predictions from GF and control SNPs")myplots <-lapply(unique(df$method_id), function(method_i){# We load the stanliststanlist <-readRDS(file =here(paste0("outputs/ValidationNFI/StanModels/M62_",method_i,".rds")))$stanlist# We load the modelmod <-readRDS(file =here(paste0("outputs/ValidationNFI/StanModels/M62_",method_i,".rds")))$mod# We extract the posterior distributions of all parameterspost <-as.data.frame(mod)# Vector of predictor values (based on the min and max of the GO predictions)x_vec <-seq(min(stanlist$GO),max(stanlist$GO),0.05)# Function to predict the mortality probability p with the initial tree height fixed to the meanp_pred <-lapply(c("alpha_country[1]","alpha_country[2]"), function(country_i){ f_p <-function(x) 1-(exp(-exp(post[,country_i] + post$`beta_DBH`*mean(stanlist$DBH) + post$`beta_C`*mean(stanlist$C) + post$`beta_C_DBH`*mean(stanlist$C) *mean(stanlist$DBH) + post$`beta_GO`* x +log(5))))sapply(x_vec, f_p) %>% tibble::as_tibble() %>%rename_all(function(x) x_vec) %>%mutate(Iter =row_number()) %>%gather(x, p, -Iter) %>%group_by(x) %>%mutate(hpdi_l = HDInterval::hdi(p, credMass =0.90)[1],hpdi_h = HDInterval::hdi(p, credMass =0.90)[2],p_mean =mean(p)) %>%ungroup() %>%mutate(x =as.numeric(x),Country=ifelse(country_i=="alpha_country[1]","Spain","France"))}) %>%bind_rows()# Keep mean and 90% HDPI of p (one value for each iteration)p_mean_df <- p_pred %>% dplyr::select(x,hpdi_l,hpdi_h,p_mean,Country) %>% dplyr::distinct()# Plots#=======# Plot options y_limits <-c(0,0.2)x_axis <-"Mean-centered GO predictions"y_axis <-"Probability of tree mortality"p <-ggplot() +ylim(y_limits) +labs(y=y_axis, x=x_axis) +theme_bw() +theme(legend.position=c(0.8,0.85),legend.text =element_text(size=13),legend.title =element_text(size=13),legend.background =element_blank())# First 100 posterior draws of the mortality probability pp1 <- p +geom_point(data = p_pred %>%filter(Iter <101),aes(x, p, color=Country), alpha = .2) # Mean (line) and 90% HPDI (shaded region) of the mortality probability pp2 <- p +geom_ribbon(data = p_mean_df,aes(x = x, ymin = hpdi_l, ymax = hpdi_h, fill=Country),alpha = .2) +geom_line(data = p_mean_df,aes(x=x, y=p_mean,col=Country))p1_p2 <-ggarrange(p1, p2 +labs(y=""), nrow =1)# Titletitle <-ggdraw() +draw_label(paste0(go_methods[[method_i]]),fontface ='bold',x =0, hjust =0) +theme(plot.margin =margin(0, 0, 0, 7))# merge title and plotsp1_p2 <-plot_grid(title, p1_p2, ncol =1, rel_heights =c(0.1, 1))p2 <- p2 +labs(subtitle = go_methods[[method_i]])ggsave(p1_p2,file=here(paste0("figs/ValidationNFI/MortalityProbabilityPredictions_UncertaintyIntervals/M62_",method_i,".pdf")),device="pdf",height=6,width=11)return(p2)})p <-plot_grid(myplots[[1]] +labs(x="") +theme(legend.position ="none"), myplots[[2]] +labs(y="",x="") +theme(legend.position ="none"), myplots[[3]] +labs(y="",x=""), myplots[[4]] +labs(x="") +theme(legend.position ="none"), myplots[[5]] +labs(y="",x="") +theme(legend.position ="none"), myplots[[6]] +labs(y="",x="") +theme(legend.position ="none"), myplots[[7]] +theme(legend.position ="none"), myplots[[8]] +labs(y="") +theme(legend.position ="none"), myplots[[9]] +labs(y="") +theme(legend.position ="none"),ncol=3)ggsave(p,file=here(paste0("figs/ValidationNFI/MortalityProbabilityPredictions_M62.pdf")),device="pdf",height=12,width=12)
Table
In the table below:
mean and 90% credible intervals (highest posterior density intervals) of \(p\) for \(x=\{-1, 0, 1\}\).
percent change in \(p\) associated with a one-unit increase in \(x\) (which corresponds to a one-standard deviation increase).
Code
p_pred <-lapply(unique(df$method_id), function(method_i){# We load the stanliststanlist <-readRDS(file =here(paste0("outputs/ValidationNFI/StanModels/M62_",method_i,".rds")))$stanlist# We load the modelmod <-readRDS(file =here(paste0("outputs/ValidationNFI/StanModels/M62_",method_i,".rds")))$mod# We extract the posterior distributions of all parameterspost <-as.data.frame(mod)# Vector of predictor values (based on the min and max of the GO predictions)x_vec <-c(-1,0,1)# Function to predict the mortality probability p with the initial tree height fixed to the meanp_pred <-lapply(c("alpha_country[1]","alpha_country[2]"), function(country_i){ f_p <-function(x) 1-(exp(-exp(post[,country_i] + post$`beta_DBH`*mean(stanlist$DBH) + post$`beta_C`*mean(stanlist$C) + post$`beta_C_DBH`*mean(stanlist$C) *mean(stanlist$DBH) + post$`beta_GO`* x +log(5))))sapply(x_vec, f_p) %>% tibble::as_tibble() %>%rename_all(function(x) x_vec) %>%mutate(Iter =row_number()) %>%gather(x, p, -Iter) %>%group_by(x) %>%mutate(hpdi_l = HDInterval::hdi(p, credMass =0.90)[1],hpdi_h = HDInterval::hdi(p, credMass =0.90)[2],p_mean =mean(p)) %>%ungroup() %>%mutate(x =as.numeric(x),Country=ifelse(country_i=="alpha_country[1]","Spain","France"))}) %>%bind_rows() %>% dplyr::select(x,p_mean,hpdi_l,hpdi_h,Country) %>%distinct() %>%mutate_if(is.numeric, round, digits=3)lapply(unique(p_pred$Country), function(country_i){p_pred_sub <- p_pred %>% dplyr::filter(Country==country_i)delta_p_pos <- (p_pred_sub$p_mean[p_pred_sub$x==1] *100/ p_pred_sub$p_mean[p_pred_sub$x==0])-100delta_p_neg <- (p_pred_sub$p_mean[p_pred_sub$x==-1] *100/ p_pred_sub$p_mean[p_pred_sub$x==0])-100tibble("Method"= go_methods[[method_i]],"Country"=country_i,"p(GO=-1)"=paste0(p_pred_sub$p_mean[p_pred_sub$x==-1], " [", p_pred_sub$hpdi_l[p_pred_sub$x==-1],";", p_pred_sub$hpdi_h[p_pred_sub$x==-1],"]"),"p(GO=0)"=paste0(p_pred_sub$p_mean[p_pred_sub$x==0], " [", p_pred_sub$hpdi_l[p_pred_sub$x==0],";", p_pred_sub$hpdi_h[p_pred_sub$x==0],"]"),"p(GO=1)"=paste0(p_pred_sub$p_mean[p_pred_sub$x==1], " [", p_pred_sub$hpdi_l[p_pred_sub$x==1],";", p_pred_sub$hpdi_h[p_pred_sub$x==1],"]"),"+Δp"=paste0(round(delta_p_pos,3),"%"),"-Δp"=paste0(round(delta_p_neg,3),"%"))}) %>%bind_rows()}) %>%bind_rows()p_pred %>%saveRDS(file =here("tables/MortalityPredNFI.rds"))
the \(\beta_{GO}\) coefficients, which stand for the association between mortality rates in NFI plots and the genomic offset predictions (i.e. capturing the potential maladaptation of the populations at the location of the NFI plots).
the \(\beta_C\) coefficients, which stands for the association between the tree basal area (all species confounded) in the NFI plots and the mortality rates.
the \(\beta_{DBH}\) coefficients, which stands for the association between the mean DBH (diameter at breast height) of all maritime pines in the NFI plots (including adults, dead trees and seedlings/saplings) and the mortality rates.
the \(\beta_{C*DBH}\) coefficients, which stands for the interaction between the mean DBH and the tree basal area.
\(\beta_{GO}\) estimates against a null distribution
We evaluated the probability that the estimated relationship between the mortality rates and GO predictions could be obtained by chance. For that, we ran again the model on the NFI data but we replaced GO predictions by a randomly generated variable \(X\) such as \(X \sim \mathcal{N}(0,1)\). We repeated this step 100 times and obtained a distribution of \(\beta_X\) coefficients capturing the relationship between mortality rates and a randomly generated variable. We then compared this distribution of \(\beta_X\) coefficients to the estimated \(\beta_{GO}\) coefficients capturing the relationship between the mortality rates and the GO predictions.
Code
# Load the coefficient values of models with predicted GOcoefftab <-readRDS(file=here("outputs/ValidationNFI/ModelM62_coefftab.rds")) %>%mutate(run_ID=method,GO="predicted") %>% dplyr::filter(term =="beta_GO") %>% dplyr::select(run_ID, GO, mean, contains("conf"))# Load the coefficient values of models with random GO and merge with the predicted GO coefficient tablecoefftab <-readRDS(file=here(paste0("outputs/ValidationNFI/Simulations/m6_2_truedata_randomGO_100simulations.rds"))) %>%bind_rows(.id="sim_ID") %>% dplyr::filter(term =="beta_GO") %>%mutate(run_ID =paste0("sim ",sim_ID),#run_ID = paste0("Simulation ",sim_ID),GO="random") %>% dplyr::select(run_ID, GO, mean, contains("conf")) %>%bind_rows(coefftab)# Plot the mean and 95% credible intervals with interval plotsp <- coefftab %>%ggplot(aes(mean, reorder(run_ID, mean))) +geom_vline(xintercept =0, color="gray30") +geom_errorbar(aes(xmin = conf95_low, xmax = conf95_high, color=GO)) +scale_color_manual(values=c("magenta","gray"),name ="Genomic offset") +geom_point() +theme_bw() +labs(x="GO effect size", y="") +theme(axis.text.y =element_text(size=6))# save the plot in pdfggsave(p,file=here(paste0("figs/ValidationNFI/ModelM62_IntervalPlots_RandomGO_vs_PredictedGO.pdf")),device="pdf",height=10,width=8)p
GO predictions in Galicia
GO predictions in Galicia (especially those obtained with GF) show a strong boundary between two zones (see Figure 4b of the manuscript). How can we explain this pattern?
In Galicia, GO predictions from the GF approach (see Figure 4b of the manuscript) are higher for plots surveyed for the second time in 1998 than plots surveyed for the second time in 1997.
We explore the climatic differences between the two sets of plots, ie plots surveyed in 1997 vs 1998. First, we extract the annual climatic values at the location of the Galician plots sampled in 1997 and 1998 and plot their values across the study period (ie between the first and second surveys).
Code
# Extracting annual climatic values at the location of the NFI plots in Galicia surveyed for the second time inm 1997 or 1998subclim <- clim %>%left_join(nfi_data, by=c("plotcode","longitude","latitude")) %>% dplyr::filter(second_survey %in%1997:1998) %>% dplyr::filter(year %in%min(first_survey):1998) %>% dplyr::select(second_survey,year,all_of(clim_var))# violin_plots <- lapply(clim_var, function(x) {# # var_name <- extract_climatedt_metadata(var_clim = x) %>% # mutate(var_legend= paste0(description, " (", label,"; ",unit_symbol,")")) %>% # pull(var_legend)# # subclim %>% # mutate(year=as.factor(year)) %>% # ggplot(aes_string(y=x,x="year")) + # geom_jitter(shape=16,aes(colour=factor(second_survey)),position=position_jitter(0.2),size=2,alpha=0.3) +# xlab("") + # scale_color_manual(name="Year of the second survey",# values = c("yellow2","darkorchid")) +# ggtitle(var_name) +# ylab("mean climate under survey period - mean past climate") +# theme_bw()# # })violin_plots <-lapply(clim_var, function(x) {var_name <-extract_climatedt_metadata(var_clim = x) %>%mutate(var_legend=paste0(description, " (", label,"; ",unit_symbol,")")) %>%pull(var_legend)subclim %>%mutate(year=as.factor(year),second_survey=as.factor(second_survey)) %>%ggplot(aes_string(y=x,x="year",color="second_survey")) +geom_point(shape=16,position=position_jitter(0.2),size=2,alpha=0.1) +geom_violin(alpha=0.3,fill="white", linewidth=0.7) +xlab("") +ylab("") +ggtitle(var_name) +scale_color_manual(name="Year of the second survey",values =c("yellow2","darkorchid")) +theme_bw() +guides(colour =guide_legend(override.aes =list(size=2)))})
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
Code
violin_plots
From these plots, it seems that the plots sampled in 1997 have lower isothermality (bio3) and lower temperature seasonality (bio4).
We also compare the mean climatic values of the reference and survey periods between the plots surveyed in 1997 and 1998. For that, we calculate for each plot (and each climatic variable) the difference between its mean climate across the reference period (ie 1901-1950) and its mean climate across the survey period (between the first and second survey period).
Code
# We load the datasets with mean climatic values across the reference and survey periods# Then we calculate the difference between the mean climates of the two periods for each plot# We then keep only plots surveyed (second survey) in 1997 or 1998 (Galician plots)subnficlim <-readRDS(here("data/ClimaticData/NFIplots/NFIclimate.rds")) %>%bind_rows() %>%group_by(plotcode,longitude,latitude,elevation) %>%summarise(across(tmn01:Eref, diff)) %>%ungroup() %>%left_join(nfi_data, by=c("plotcode","longitude","latitude")) %>% dplyr::filter(second_survey %in%1997:1998)bg_color <-"gray44"# background colorviolin_plots <-lapply(clim_var, function(x) {var_name <-extract_climatedt_metadata(var_clim = x) %>%mutate(var_legend=paste0(description, " (", label,"; ",unit_symbol,")")) %>%pull(var_legend)subnficlim %>%mutate(second_survey=as.factor(second_survey)) %>%ggplot(aes_string(y=x,x="second_survey")) +geom_violin(alpha=0.2, fill="gray76", color="darkgreen") +geom_jitter(shape=16,position=position_jitter(0.2),size=2,alpha=0.4) +xlab("") +ggtitle(var_name) +ylab("mean climate under survey period - mean past climate") +theme_bw()})violin_plots
These plots show that plots surveyed in 1998 show higher increase in the mean annual temperature (bio1), lower increase in annual precipitation (bio12) and stronger decrease in temperature seasonality (bio4) than plots surveyed in 1997. As temperature seasonality is the most important variable to explain the allelic turnover in GF, it may explain the strong differences in GO predictions among plots surveyed in 1997 and 1998.
Changenet, Alexandre, Paloma Ruiz-Benito, Sophia Ratcliffe, Thibaut Fréjaville, Juliette Archambeau, Annabel J Porte, Miguel A Zavala, Jonas Dahlgren, Aleksi Lehtonen, and Marta Benito Garzón. 2021. “Occurrence but Not Intensity of Mortality Rises Towards the Climatic Trailing Edge of Tree Species Ranges in European Forests.”Global Ecology and Biogeography.